1,276 research outputs found

    A Recurrent Neural Network Survival Model: Predicting Web User Return Time

    Full text link
    The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

    Study of Digital Competence of the Students and Teachers in Ukraine

    Get PDF
    Professional fulfillment of the personality at the conditions of the digital economy requires the high level of digital competency. One of the ways to develop these competencies is education. However, to provide the implementation of digital education at a high level, the digital competency of the teachers and students is a must. This paper presents explanations on the level determination of the digital competencies for teachers and students in Ukraine according to the DigComp recommendations. We tried to identify the main factors that reflect the degree of readiness teachers and students for digital education based on their self-evaluation. We also attempted to estimate the level of digital competencies based on the analysis of Case-Studies execution results. The complex analysis let us assess the connection between respondents’ self-evaluation and their real competencies. Here we provide a methodology and a model of level competencies determination by means of a survey, expert case rating and the results of the statistical analysis. On the basis of the obtained results, this paper suggests further research prospects and recommendations on the digital competency development in educational institutions in Ukraine

    Calculating the sample size required for developing a clinical prediction model.

    Get PDF
    Clinical prediction models aim to predict outcomes in individuals, to inform diagnosis or prognosis in healthcare. Hundreds of prediction models are published in the medical literature each year, yet many are developed using a dataset that is too small for the total number of participants or outcome events. This leads to inaccurate predictions and consequently incorrect healthcare decisions for some individuals. In this article, the authors provide guidance on how to calculate the sample size required to develop a clinical prediction model

    Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    Get PDF
    Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

    Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer

    Full text link
    With the long-term rapid increase in incidences of colorectal cancer (CRC), there is an urgent clinical need to improve risk stratification. The conventional pathology report is usually limited to only a few histopathological features. However, most of the tumor microenvironments used to describe patterns of aggressive tumor behavior are ignored. In this work, we aim to learn histopathological patterns within cancerous tissue regions that can be used to improve prognostic stratification for colorectal cancer. To do so, we propose a self-supervised learning method that jointly learns a representation of tissue regions as well as a metric of the clustering to obtain their underlying patterns. These histopathological patterns are then used to represent the interaction between complex tissues and predict clinical outcomes directly. We furthermore show that the proposed approach can benefit from linear predictors to avoid overfitting in patient outcomes predictions. To this end, we introduce a new well-characterized clinicopathological dataset, including a retrospective collective of 374 patients, with their survival time and treatment information. Histomorphological clusters obtained by our method are evaluated by training survival models. The experimental results demonstrate statistically significant patient stratification, and our approach outperformed the state-of-the-art deep clustering methods

    Exposure-Response Estimates for Diesel Engine Exhaust and Lung Cancer Mortality Based on Data from Three Occupational Cohorts

    Get PDF
    Background: Diesel engine exhaust (DEE) has recently been classified as a known human carcinogen. Objective: We derived a meta-exposure–response curve (ERC) for DEE and lung cancer mortality and estimated lifetime excess risks (ELRs) of lung cancer mortality based on assumed occupational and environmental exposure scenarios. Methods: We conducted a meta-regression of lung cancer mortality and cumulative exposure to elemental carbon (EC), a proxy measure of DEE, based on relative risk (RR) estimates reported by three large occupational cohort studies (including two studies of workers in the trucking industry and one study of miners). Based on the derived risk function, we calculated ELRs for several lifetime occupational and environmental exposure scenarios and also calculated the fractions of annual lung cancer deaths attributable to DEE. Results: We estimated a lnRR of 0.00098 (95% CI: 0.00055, 0.0014) for lung cancer mortality with each 1-μg/m3-year increase in cumulative EC based on a linear meta-regression model. Corresponding lnRRs for the individual studies ranged from 0.00061 to 0.0012. Estimated numbers of excess lung cancer deaths through 80 years of age for lifetime occupational exposures of 1, 10, and 25 μg/m3 EC were 17, 200, and 689 per 10,000, respectively. For lifetime environmental exposure to 0.8 μg/m3 EC, we estimated 21 excess lung cancer deaths per 10,000. Based on broad assumptions regarding past occupational and environmental exposures, we estimated that approximately 6% of annual lung cancer deaths may be due to DEE exposure. Conclusions: Combined data from three U.S. occupational cohort studies suggest that DEE at levels common in the workplace and in outdoor air appear to pose substantial excess lifetime risks of lung cancer, above the usually acceptable limits in the United States and Europe, which are generally set at 1/1,000 and 1/100,000 based on lifetime exposure for the occupational and general population, respectively. Citation: Vermeulen R, Silverman DT, Garshick E, Vlaanderen J, Portengen L, Steenland K. 2014. Exposure-response estimates for diesel engine exhaust and lung cancer mortality based on data from three occupational cohorts. Environ Health Perspect 122:172–177; http://dx.doi.org/10.1289/ehp.130688

    Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

    Get PDF
    Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

    State of the art in selection of variables and functional forms in multivariable analysis-outstanding issues

    Get PDF
    Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. Methods: We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Results: Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research. Conclusions: Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required

    Protocol for Physiotherapy OR Tvt Randomised Efficacy Trial (PORTRET): a multicentre randomised controlled trial to assess the cost-effectiveness of the tension free vaginal tape versus pelvic floor muscle training in women with symptomatic moderate to severe stress urinary incontinence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Stress urinary incontinence is a common condition affecting approximately 20% of adult women causing substantial individual (quality of life) and economic (119 million Euro/year spent on incontinence pads in the Netherlands) burden. Pelvic floor muscle training (PFMT) is regarded as first line treatment, but only 15-25% of women will be completely cured. Approximately 65% will report that their condition improved, but long term adherence to treatment is problematic. In addition, at longer term (2-15 years) follow-up 30-50% of patients will end up having surgery. From 1996 a minimal invasive surgical procedure, the Tension-free Vaginal Tape (TVT) has rapidly become the gold standard in surgical treatment of stress urinary incontinence. With TVT 65-95% of women are cured. However, approximately 3-6% of women will develop symptoms of an overactive bladder, resulting in reduced quality of life. Because of its efficacy the TVT appears to be preferable over PFMT but both treatments and their costs have not been compared head-to-head in a randomised clinical trial.</p> <p>Methods/Design</p> <p>A multi-centre randomised controlled trial will be performed for women between 35 - 80 years old with moderate to severe, predominantly stress, urinary incontinence, who have not received specialised PFMT or previous anti-incontinence surgery. Women will be assigned to either PFMT by a specialised physiotherapist for a standard of 9-18 session in a period of 6 months, or TVT(O) surgery. The main endpoint of the study is the subjective improvement of urinary incontinence. As secondary outcome the objective cure will be assessed from history and clinical parameters. Subjective improvement in quality of life will be measured by generic (EQ-5D) and disease-specific (Urinary Distress Inventory and Incontinence Impact Questionnaire) quality of life instruments. The economical endpoint is short term (1 year) incremental cost-effectiveness in terms of costs per additional year free of urinary incontinence and costs per Quality Adjusted Life Years (QALY) gained. Finally, treatment strategy and patient characteristics will be combined in a prediction model, to allow for individual treatment decisions in future patients. Four hundred female patients will be recruited from over 30 hospitals in the Netherlands</p> <p>Trial registration</p> <p>Nederlands trial register: NTR 1248</p
    corecore